feat: vLLM backend #2010

gau-nernst · 2025-02-21T01:01:25Z

Describe Your Changes

High-level design

vLLM is an inference engine for large-scale (many GPUs)
cortex will spawn an vLLM subprocess and route the requests to vLLM

`cortex engines install vllm`

Download uv to cortexcpp/python_engines/bin/uv if uv is not installed
(via uv) Setup venv at cortexcpp/python_engines/envs/vllm/<version>/.venv
(via uv) Download vllm and its deps
Known issues:
- Progress streaming is not supported (since download is done via uv instead of DownloadService).
- It's not async since we need to wait for subprocess to finish (perhaps we will need a new SubprocessService in the future which handles async WaitProcess())
- Hence, stopping and resuming download also does not work.

Note:

All cached Python packages are stored in cortexcpp/python_engines/cache/uv. The purpose is that when we remove python_engines folder, we are sure that we don't leave anything behind.

`cortex models start <model>`

Spawn vllm serve

TODO:

Fixes Issues

Closes vLLM backend for Cortex #1890

Self Checklist

Added relevant comments, esp in complex areas
Updated docs (for bug fixes / features)
Created issues for follow-up changes or refactoring needed

…e to launch model (non-functional atm)

fix: std::filesystem::equivalent does not work for non-exist path

gau-nernst added 30 commits February 14, 2025 09:14

wip: download uv

60b13bb

Merge branch 'dev' into thien/python_engine

3ddce8c

fix: has_value -> has_error

f9817c8

move uv stuff to python_engine. use uv to start process

2dbc296

redirect stdout/stderr

eec24bd

simplify code

26fdbd3

rename python engine interface

3ba7994

use PythonEngineI

5e7125f

more checks to match all EngineV variants

c5da0ee

improve Python load model

3c097fb

consolidate process-related functions

84db8b0

update PythonModelConfig. add UnloadModel

8ee815c

implement PythonEngine::GetModels

29f5344

Merge branch 'dev' into thien/python_engine

75ce355

implement getModelStatus. add some notes

7949dcc

add router for python

e2f0323

call PythonEngine destructor

607d2cb

remove unused method

f58b773

remove unnecessary headers

bf23c9f

Merge branch 'dev' into thien/python_engine

d7818d5

remove unused stuff

8ebee7c

download uv directly from github release

8f36adc

check for entrypoint

5ebfbb7

only record model size for llama.cpp

5d310d1

don't include headers

c4c622c

Merge branch 'dev' into thien/python_engine

fc0369c

don't use std::optional to support < c++17

6b59878

fix stringstream usage

250a2ac

define pid_t for windows

bb38a56

explicit call .string() on filesystem::path to support windows

723c5db

gau-nernst added 16 commits March 18, 2025 13:06

support download HF model

591d461

use / for HF model

c3d41bf

fix thread-unsafe

dc42ddd

Merge branch 'dev' into thien/python_engine

13d9e3f

Merge branch 'dev' into thien/python_engine

70151e2

remove methods

73fe3e5

remove old remnants

7bf287d

support models list. add --relocatable for venv

2a2b607

preparation works for start model

fffc686

add sync download util. add vLLM version config. some boilerplate cod…

cea8020

…e to launch model (non-functional atm)

list engines

86d4c01

load and unload model

ec8b36d

retrieve cortex port from yaml file

9226110

add env vars support. log stdout and stderr

eeccd3a

add GetModelStatus and GetModels

6fe7ae8

fix typo

074a04a

gau-nernst moved this from Icebox to In Progress in Menlo Mar 20, 2025

gau-nernst added 2 commits March 21, 2025 15:34

Merge branch 'dev' into thien/python_engine

cd55d64

add non-stream chat completions

368a4f3

gau-nernst mentioned this pull request Mar 22, 2025

idea: Apple MLX #678

Open

vansangpfiev and others added 10 commits March 27, 2025 15:12

Merge pull request #2186 from menloresearch/s/chore/sync-dev

c0e0fca

fix: std::filesystem::equivalent does not work for non-exist path

Merge branch 'main' into thien/python_engine

e141891

add uninstall cmd

807b201

support streaming

d38eca8

fix cortex run

7e002cd

wait for vLLM server to be up

1ebbbdb

use health check for some stuff

b5d8315

add some notes. support embeddings. support some extra vLLM args

5feda51

remove old tests. some chores

5eea345

remove unused function

2bde26a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: vLLM backend #2010

feat: vLLM backend #2010

Uh oh!

gau-nernst commented Feb 21, 2025 •

edited

Loading

Uh oh!

Uh oh!

feat: vLLM backend #2010

Are you sure you want to change the base?

feat: vLLM backend #2010

Uh oh!

Conversation

gau-nernst commented Feb 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe Your Changes

High-level design

cortex engines install vllm

cortex models start <model>

Fixes Issues

Self Checklist

Uh oh!

Uh oh!

gau-nernst commented Feb 21, 2025 •

edited

Loading

`cortex engines install vllm`

`cortex models start <model>`